Bidirectional LSTM-RNN for Improving Automated Assessment of Non-Native Children's Speech
نویسندگان
چکیده
Recent advances in ASR and spoken language processing have led to improved systems for automated assessment for spoken language. However, it is still challenging for automated scoring systems to achieve high performance in terms of the agreement with human experts when applied to non-native children’s spontaneous speech. The subpar performance is mainly caused by the relatively low recognition rate on non-native children’s speech. In this paper, we investigate different neural network architectures for improving non-native children’s speech recognition and the impact of the features extracted from the corresponding ASR output on the automated assessment of speaking proficiency. Experimental results show that bidirectional LSTM-RNN can outperform feed-forward DNN in ASR, with an overall relative WER reduction of 13.4%. The improved speech recognition can then boost the language proficiency assessment performance. Correlations between the rounded automated scores and expert scores range from 0.66 to 0.70 for the three speaking tasks studied, similar to the humanhuman agreement levels for these tasks.
منابع مشابه
TTS synthesis with bidirectional LSTM based recurrent neural networks
Feed-forward, Deep neural networks (DNN)-based text-tospeech (TTS) systems have been recently shown to outperform decision-tree clustered context-dependent HMM TTS systems [1, 4]. However, the long time span contextual effect in a speech utterance is still not easy to accommodate, due to the intrinsic, feed-forward nature in DNN-based modeling. Also, to synthesize a smooth speech trajectory, th...
متن کاملTowards Online-Recognition with Deep Bidirectional LSTM Acoustic Models
Online-Recognition requires the acoustic model to provide posterior probabilities after a limited time delay given the online input audio data. This necessitates unidirectional modeling and the standard solution is to use unidirectional long short-term memory (LSTM) recurrent neural networks (RNN) or feedforward neural networks (FFNN). It is known that bidirectional LSTMs are more powerful and ...
متن کاملAutomatic scoring of non-native children's spoken language proficiency
In this study, we aim to automatically score the spoken responses from an international English assessment targeted to non-native English-speaking children aged 8 years and above. In contrast to most previous studies focusing on scoring of adult non-native English speech, we explored automated scoring of child language assessment. We developed automated scoring models based on a large set of fe...
متن کاملBidirectional LSTM-HMM Hybrid System for Polyphonic Sound Event Detection
In this study, we propose a new method of polyphonic sound event detection based on a Bidirectional Long Short-Term Memory Hidden Markov Model hybrid system (BLSTM-HMM). We extend the hybrid model of neural network and HMM, which achieved stateof-the-art performance in the field of speech recognition, to the multi-label classification problem. This extension provides an explicit duration model ...
متن کاملAutomated Word Stress Detection in Russian
In this study we address the problem of automated word stress detection in Russian using character level models and no partspeech-taggers. We use a simple bidirectional RNN with LSTM nodes and achieve the accuracy of 90% or higher. We experiment with two training datasets and show that using the data from an annotated corpus is much more efficient than using a dictionary, since it allows us to ...
متن کامل